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Abstract 

Language extensions of Fortran are being developed which permit the user to map 
data structures to the individual processors of distributed memory machines. These 
languages allow a programming style in which global data references are used. Current 
efforts are focussed on designing a common basis for such languages, the result of which 
is known as High Performance Fortran (HPF). One of the central debates in the HPF 
effort revolves around the concept of templates, introduced as an abstract index space 
to which data could be aligned. In this paper, we present a model for the mapping 
of data which provides the functionality of High Performance Fortran distributions 
without the use of templates. 


"■Research supported by the National Aeronautics and Space Administration under NASA contract NAS1- 
19480 while the authors were in residence at ICASE, Mail Stop 132C, NASA Langley Research Center, 
Hampton, VA 23681, and also by the Austrian Research Foundation (FWF) and the Austrian Ministry for 
Science and Research. This paper is partially based on Chapter 3 of the Version 0.2 draft HPF specification 
[ 8 ], 
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1 Introduction 


Much current research activity is concentrated on providing suitable programming tools for 
distributed-memory architectures. One focus is on the provision of appropriate high-level 
language constructs to enable users to design programs in much the same way as they are 
accustomed to on a sequential machine. Several proposals have been put forth in recent 
months for a set of language extensions to achieve this [3, 4, 5, 6, 10], in particular (but not 
only) for Fortran. 

Recently, a coalition of researchers from industry, government labs and academia formed 
the High Performance Fortran Forum to develop a standard set of extensions for Fortran 
90 which would provide a portable interface to a wide variety of parallel architectures. The 
forum has produced a draft proposal for a language, called High Performance Fortran (HPF), 
which focuses mainly on issues of distributing data across the memories of a distributed 
memory multiprocessor. 

High Performance Fortran (HPF) adds directives to Fortran 90 to allow the user to 
advise the compiler on the allocation of data objects to processor memories. The three basic 
elements of the model are: 

• abstract processors, 

• distributions, which are mappings of objects to abstract processors, 

• alignments, which are mappings of data objects to other objects. 

The distribution of an object (usually an array) specifies a mapping of the index domain 
associated with the object to the index domain of a set of abstract processors. This may 
be specified by the user: a) directly, by explicitly specifying suitable directives, or b) 
indirectly, by using an alignment that relates the index domain of the array to the index 
domain of another object whose distribution is known. 

The HPF directives provide a way to direct the compiler to ensure that certain data 
objects will reside in the same processor. The underlying motivation is that an operation 
on two or more data objects is likely to be carried out much faster if they all reside in the 
same processor, and, furthermore, it may be possible to carry out several such operations 
concurrently if they can be performed on different processors. 

Alignment can serve as a bundling mechanism : once many arrays are aligned to the same 
object, then they can be distributed onto a processor arrangement with a single statement. 

In general, arrays are aligned to other arrays. However, HPF has introduced the concept 
of templates to be used as an alignment base. As stated in the HPF language specification [8]: 
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Sometimes it is desirable to consider a large index space with which several 
smaller arrays are to be aligned, but not to declare any array that spans the 
entire index space. HPF provides the notion of a TEMPLATE, which is like 
an array whose elements have no content and therefore occupy no storage; it is 
merely an abstract index space that can be distributed and with which arrays may 
be aligned. 

The problem with this approach is that even though it is useful in some special situations, 
the concept of templates necessarily complicates the whole underlying semantic model. Since 
templates are not first class objects in the language (they can occur only in directives), they 
cannot be passed across procedure boundaries, and thus cannot be used to describe the 
distributions and alignments of procedure arguments. Also, as currently defined, the size 
of templates has to be a specification expression and hence templates cannot be used for 
describing the alignment of Fortran 90 allocatable arrays. 

In this paper, we show that the HPF distribution and alignment model can be defined 
in a clear and concise manner without templates, while retaining the intended functionality. 

The major differences between the current HPF draft [8] and the language proposed in 
this paper are as follows. The model has been simplified by: 

1. Removing template directives. 

2. Limiting the height of alignment trees to 1 . 

3. Clarifying the role of processors by establishing a language defined mapping to an 
implementation-specific abstract processors arrangement. 

4. Passing of arguments to procedures has been simplified by eliminating the INHERIT 
attribute, matching alignments, and the TO-clause for dummy arguments. 

At the same time, the language has been significantly generalized with the objective of 
improving object program performance. In particular: 

1. Arrays may be distributed to processor sections. 

2. The set of distribution functions has been extended by including GENERAL-BLOCK. 
This allows the specification of irregular block distributions, which are important for 
the support of load balancing, and can be implemented efficiently [13]. 

3. The concept of distribution functions has been defined in a general way so that future 
language standards may easily incorporate more general mappings. 
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The paper is organized as follows. In the next section we describe the model and ter- 
minology underlying our proposal. The subsequent sections introduce the main language 
extensions — processors, distribution directives and alignment directives. Issues involving 
allocatable arrays and procedures are treated separately. We then discuss the issues arising 
due to HPF templates and conclude with a discussion of related work. 

2 Model 

2.1 Index Domains 

An index domain I of rank (dimension) n is an ordered set of subscript tuples that can be 
represented by a subscript-triplet-list of length n (see Fortran 90 specification, R619). Each 
element of an index domain is called an index; it represents an n-dimensional arrangement 
of values. I is called a standard index domain iff the stride in each subscript triplet is 1. 

Let A denote a declared data array (or processor array) that has been created. Then A 
is associated with a standard index domain which we denote by T 4 . 

2.2 Distributions 

A distribution of an array maps each array element to one or more processors which become 
the owners of the element and, in this capacity, store the element in their local memory. 
We model distributions by mappings between the associated index domains. 

Definition 1 Index Mappings 

Let I, J denote two index domains. An index mapping from I to 3 is a total function 

where V (J) denotes the powerset of J . 

Definition 2 Distributions 

Let A denote an array, and R a processor array. An index mapping 8 £ from I A to I R is 
called a distribution function for A with respect to R. 

A distribution function 8 $ - which is a mapping between index domains - induces an 
associated element-based distribution that maps elements of A to one or more abstract 
processors.* 

Note that scalars can easily be accommodated in our model by treating them as if they 
were associated with an index domain consisting of exactly one element. 

•Note that replication can be modeled as a special case of distribution, since every array element can be 
distributed to an arbitrary (positive) number of processors. 
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2,3 Alignment 

Definition 3 Let A,B denote arbitrary arrays. An index mapping a q from I A to I B is 
called an alignment function for A with respect to B. 

Definition 4 Construction of a distribution 

If A } B, 8 and : 1^ — ► 'P (1^) — {^} given as above , then 8^ can be determined as 
follows: For each i € I A : 


fjS(i) :=Uj 6 .(l,*S0) 

We will express this relationship below in the form 

8& = CONSTRUCTS, 8%). 

This can be verbally described as follows: if i is an index of A which is mapped to an 
index j of B via the alignment function oc, then A(i) and £?( j) are guaranteed to reside in 
the same processor under any given distribution for B. 

2.4 The Alignment Relation 

For the following discussion, we consider the data space A of all arrays that are accessible 
in a given scope, and have been created, at a given time during the execution of a program 
unit. 

An alignment directive (see Section 5) establishes an alignment from an array At, 
the alignee, to an array A ? , the alignment base. It defines an alignment function for At 
with respect to A 2 . 

An HPF program must satisfy the following constraints: 

1. Each array occurring as an alignment base must not be aligned to another array. For 
such an array, a distribution must be specified directly. 

2. Each array occurring as an alignee can be aligned with only one alignment base. 

This enables us to represent A as an alignment forest, consisting of a set of alignment 
trees. The nodes in the alignment forest represent arrays, and there is a directed edge from 
B to A if and only if A is aligned to B. The height of alignment trees may be either 1 or 
0. An alignment tree of height 0 is called degenerate: it consists of exactly one node that 
represents an array which is not aligned to any other array, and to which no other array is 
aligned. 
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Each alignment tree T has a uniquely defined root, which is called the primary array 
of T. All other nodes of T are called secondary arrays. 

Let B denote a primary array. Then there is either a directive which explicitly specifies 
a distribution for B or B is implicitly distributed by the compiler. Primary arrays are the 
only arrays with this property. 

Let A denote an arbitrary secondary array of a tree with primary array B . Then there 
exists an alignment function a, describing the alignment from A to B. If 6 ® is the distribution 
of B , the distribution of A satisfies = CONST RUCT(a,8j{). 

After the specification part of a unit has been completely processed, the alignment forest 
can be constructed for the set of all arrays that are accessible and have already been created. 
This is the initial state for the actual alignment forest associated with the processing of the 
executable part of the program. The structure of the forest may change dynamically during 
execution as a result of executing REDISTRIBUTE and REALIGN directives, ALLOCATE 
and DEALLOCATE statements, and procedure calls. 

For the details of these manipulations see Sections 4.2, 5.2, and 7. Distribution and 
alignment functions are explained in Sections 4 and 5, respectively. 

3 The Processors Directive 

Each implementation of HPF determines uniquely an implicit abstract processor ar- 
rangement, AP, which specifies a linear numbering scheme for the physical processors of 
the underlying machine. 

The PROCESSORS directive declares one or more processor arrangements, each of which 
may be either a processor array arrangement or a conceptually scalar processor ar- 
rangement. 

The specification of a processor arrangement determines the name and, in the case of a 
processor array arrangement, a non-empty index domain. It must appear in the specification 
part of a program unit. 

Each processor arrangement is mapped to AP in the same way as storage association is 
defined for the Fortran 90 EQUIVALENCE statement, with abstract processors playing the 
role of the storage units (see Fortran 90 specification, 5.5.1). The sharing of an abstract 
processor implies the sharing of the associated physical processor. 

Depending on the target architecture, data distributed to a (conceptually) scalar pro- 
cessor arrangement may reside in a single control processor (if the machine has one), or 
may reside in an arbitrarily chosen processor, or may be replicated over all processors. The 
language does not specify a relationship between different scalar processor arrangements. 
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4 Distribution Directives 


The DISTRIBUTE directive specifies the distribution (Section 2.2) of one or more arrays, 
the distributees, by establishing for each distributee a mapping between its index domain 
and the index domain of the distribution target, which is either a processor array or a 
section thereof. The distribution target is specified, after the keyword TO, in a TO-clause. 
The mapping between distributee and processor array can be specified either explicitly, 
as a distribution format list, or as an inherited distribution. The elements in the 
distribution format list are associated with the dimensions of the distributee; each element 
is one of the following: 

1. BLOCK 

2. GENERAL-BLOCK (restricted-expression) 

3. CY CLIC[(specification-expression)j 

4. 

The meaning of these elements will be discussed below. Inherited distributions will be 
discussed in Section 7. 

Examples: 

!HPF$ DISTRIBUTE A (BLOCK) 

!HPF$ DISTRIBUTE B (CYCLIC) TO Q(1:N0P:2) 

!HPF$ DISTRIBUTE C(GENERAL_BLOCK(S)) 

!HPF$ DISTRIBUTE (BLOCK,:) :: E,F 

4.1 Determining an Array Distribution 

Let A denote an array of rank n which is not a dummy argument, and assume that R is 
the associated distribution target (explicitly or implicitly specified). The distribution of A is 
specified by a list of distribution formats. The length of this list must be n. A distribution 
format specifies that the corresponding array dimension is not being distributed. The 
rank of R must be n, reduced by the number of colons in the distribution format-list. The non- 
colon entries in the distribution format list are matched from left to right to the dimensions 
of R. For each such entry, a distribution function is determined according to the rules defined 
below. Here we assume both the array and the processor array are one-dimensional, with 
index domains 1 A = [1 : N ] and I R = [/ : NP]. We will define the functions associated 
with the distribution formats by specifying the associated distributions, which will be simply 
denoted by S. 
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4.1.1 Block Distributions 

The block distribution function is specified by the distribution format BLOCK; it divides 
the array into contiguous blocks whose sizes are identical, except possibly for the last block, 
which may be of a smaller size. More precisely, let q := Then: 

• £(*') = {j} for all i, 1 < i < N, where j = { f . 

• The local index associated with element A(i) in processor R(j) is i — (j — 1) * q. 

4.1.2 General Block Distributions 

A distribution format for a general block distribution is of the form GENERAL.BLOCK(G), 
where G is an integer array with index domain [1:M], where M > NP — 1. 

A is partitioned into NP contiguous blocks. For alii, 1 < i < N P, G(i ) specifies the upper 
bound of block i. The index range associated with block 1 is [1 : (7(1)]; for 1 < i < NP , 
[G(i — 1) + 1 : G(i)] is the index range of block i ; and [G(M — 1) + 1 : N] is the index range 
of block NP. 

4.1.3 Cyclic Distributions 

Block-cyclic distributions are specified by the distribution format CYCLIC(fc), with an ar- 
gument, k > 1, of type integer. CYCLIC(fc) defines contiguous segments of length k and 
maps them cyclically to the processors. The distribution function is given as follows: 

6{i) = {MODULO{\if},NP +1)} for all i,l <t <N 

Cyclic distributions are specified by the distribution format CYCLIC. This is equivalent 
to CYCLIC(l). 

4.2 The REDISTRIBUTE Directive 

The REDISTRIBUTE directive is syntactically similar to the DISTRIBUTE directive but 
may appear only in the execution part of a program unit. It is used for dynamically changing 
the distribution of an array and may only be used for arrays that have been declared as 
DYNAMIC. 

If an array B is redistributed, then every array A that is aligned to B is redistributed in 
such a way that the relationship expressed by the alignment function linking A to B is kept 
invariant (see Section 2.4). If B is a secondary array at the time of redistribution, then the 
actual alignment forest changes as follows: B is disconnected from A and made into a new 
degenerate tree with primary array B. 
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5 Alignment Directives 


The ALIGN directive is used to distribute data objects indirectly, by specifying one or more 
direct alignment relationships and the associated alignment functions (see Sections 2.3 and 
2.4). 

Every axis of the alignee is specified as either or or an align-dummy , which is a 
scalar integer variable. If it is then positions along that axis will be spread out across the 
matching axis of the alignment base; if it is then that axis is collapsed: positions along 
that axis make no difference in determining the corresponding position of the alignment base. 
(Replacing the with an align-dummy not used anywhere else in the directive would have 
the same effect; thus this notation is a convenience only). An align-dummy is considered to 
range over all valid index values for that dimension of the alignee. 

Each element of the alignee is aligned with all corresponding positions of the alignment 
base. 

5.1 Determining the Alignment Function 

This section describes how an ALIGN directive specifies the alignment function associated 
with the direct alignment relationship between alignee and alignment base. Let 

• A denote the alignee, and I A = [L\ : U\, . . . , L n : U n ] 

• B denote the alignment base, and I s = [L\ : L' m : J 

The alignment function mapping I" 4 to the power set of I s will be denoted by a. 

Assume that the directive has the form 

ALIGN A(s,,...,s n ) WITH 

where 

• each s,- is or an align-dummy 

• each tj is a base-subscript. This can be any of the following cases: 

— a dummyless- expr, i.e., a scalar integer expression in which no align-dummy occurs 

— a dummyuse-expr , i.e., a scalar integer expression in which exactly one align- 
dummy occurs 

— a subscript-triplet 
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We explain the construction of a by first applying a sequence of transformations to the 
directive which eliminate and in the alignee, and subscript-triplets as well as in 
the base-subscript-list. The transformations are specified as follows: 

• Assume that s,- matches the subscript triplet tj = [LT : UT : ST]. Then 

Ui — Li + 1 < MAX(INT(UT - LT + ST)/ST,0 ) must hold. The positions in axis i 
of the alignee are spread out across axis j of the alignment base: 

Si is replaced by a new align-dummy J, and tj is replaced by the expression 

(J — Li) * ST + LT. (This is analogous to array assignment). 

• Assume that s, Then axis i of the alignee is collapsed: 

Si is replaced by a new align-dummy J which occurs nowhere else. 

• Assume that tj This denotes replication: 

B(U , . . . , tj. i, *, tj+ 1 , . . . , t m ) is replaced by the set 

{B(th . . . ,tj-\, k, tj+\ , . . . , t m ) | Lj < Jc < Uj}- 

By applying these transformations until neither the alignee nor the alignment base contain 
positions with either or we obtain: 

• a reduced alignee of the form A(Ji, ... ,«/„), where the Ji are distinct align- dummies. 
The range of Ji is given by [Li : Ui]. 

• an alignment base set ABS, every element of which has the form B(y\, . . . ,y m ), where 

each yj is either a dummyless- expr or a dummy-use-expr. The operators and 

may be applied to form expressions which are linear in the align-dummy. Since 
linear expressions cannot handle some frequently occurring cases, such as truncation at 
either end of the alignment, we also allow the intrinsic functions MAX, MIN, LBOUND, 
UBOUND, and SIZE to be used in alignment functions. Each Ji may occur in at most 
one yj (this excludes the possibility to specify skew alignments). 

The basic rules for determining a are now as follows: 

1. Select an arbitrary tuple j = (j x , . . . ,j n ), where each ji is a value in the range of J,, 
and substitute ji for each occurrence of Ji in ABS. 

2. Evaluate all expressions in the modified set ABS ; this evaluation is performed modulo 
the extent of the associated dimension of the alignment base: the value y associated 
with dimension j is replaced by y = M I N(U'j,y). 
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Example: 


REAL A(1:N), D(1:N,1:M) 

!HPF$ ALIGN A(:) WITH D(:,*) 

aligns a copy of A with every column of D. The reduced alignee has the form A(J), where the 
range of J is [1 : N], For the alignment base set we obtain: A BS={D(J, k) | 1 < k < A/}. 
Hence, c*(J) = {(J, k) | 1 < k < M} for each J € [1 : N}. 

Example: 

REAL B(1:N,1:M), E(1:N) 

!HPF$ ALIGN B(:,*) WITH E(:) 

Here, the reduced alignee has the form B(Ji,J 2 ), where the range of Ji is [1 : A] and 
the range of J 2 is [1 : M\. For the alignment base set we obtain:/! BS={E{J\ )}. Thus, 
a(JhJ 2 ) = {(«/i)} for each J\ G [1 : N ] and J 2 £ [1 : M]. 

5.2 The REALIGN Directive 

The REALIGN directive is syntactically similar to the ALIGN directive but may appear only 
in the execution-part of a program unit. It is used for dynamically changing the alignment 
of an array and again may only be used for arrays that have been declared as DYNAMIC. 

Assume that A is the alignee, B the base array, with distribution 8 and cv the alignment 
function determined by the REALIGN directive. Then the actual alignment forest is modified 
as described by the steps below: 

1. If A is a primary array at the root of a non-degenerate tree immediately before ex- 
ecution of the REALIGN directive, then all secondary arrays associated with A are 
disconnected from A and made into primary arrays of degenerate trees with their 
current distribution. 

If A is a secondary array with associated primary array B / < > then A is disconnected 
from B f . (Note that B f — B is possible). 

2. A is made a new secondary array of B. 

3. The distribution of A is determined as 6# = CON ST RUCT(a, 6%) 
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6 Allocatable Arrays 

Distribution and alignment for variables with the ALLOCATABLE attribute may be speci- 
fied using DISTRIBUTE or ALIGN directives. These directives may occur in the specification- 
part of a program unit just as for other arrays: the associated attributes are propagated 
to each associated ALLOCATE statement. Such variables may also be used in REDIS 
TRIBUTE and REALIGN directives. 

In the following example, distributions are specified for the allocatable arrays A, G and 
D which are valid for each allocation instance. When C is allocated in the instance shown, 
it is given a cyclic distribution in the executable REDISTRIBUTE directive. At the time 
ALLOCATE is applied to an array B , the array is created according to the alignment given 
in the executable REALIGN statement. The actual alignment forest is modified by entering 
B as a new element in the position determined by the alignment relationships involving B. 
At the time DEALLOCATE is applied to B, the array is removed from the alignment forest 
and each array A directly aligned to B is made into a new tree with primary A. Note that a 
local array which is not declared ALLOCATABLE cannot be aligned in the specification-part 
of a program unit to an allocatable array. 

Example: 

REAL , ALLOCATABLE ( : , : ) :: A,B 
REAL , ALLOCATABLE ( : ) :: C,D 
!HPF$ PROCESSORS PR(32) 

!HPF$ DISTRIBUTE A (CYCLIC .BLOCK) 

! HPF$ DI STRIBUTE ( BLOCK) :: C,D 
!HPF$ DYNAMIC B,C 


READ 6 ,M,N 
ALLOCATE (A(N*M,N*M) ) 
ALLOCATE(B(N,N) ) 

!HPF$ REALIGN B( : , : ) WITH A(M::M,1::M) 
ALL0CATE(C(10000) , D(10000)) 
!HPF$ REDISTRIBUTE C (CYCLIC) TO PR 
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7 Procedures 


The distribution of dummy arguments can be specified as shown below; it can also be 
specified by giving an alignment to another dummy argument or a local data object in the 
usual way. Further, a local data object may be aligned to a dummy argument. 

The alignment tree, as defined in Section 2.4, is local to a procedure. Thus, an array 
which is the actual argument of a procedure call is not connected with its alignment tree in 
the calling unit during execution of the called procedure. 

If a dummy argument is redistributed or realigned during execution of the procedure, 
then the original distribution must be restored on procedure exit. 

The distribution of a dummy argument A can be specified in four different ways: 

1. explicitly by providing a distribution specification of the form: 

DISTRIBUTE A d [TO r] 

where d is a parenthesized distribution format-list , and ris the distribution target. Here, 
the distribution of the actual argument is changed, if necessary, to the distribution 
determined by the specification (see Section 4.1). If necessary, the distribution of A 
before the call has to be restored upon exit from the procedure. 

2. by inheritance, syntactically expressed by: 

DISTRIBUTE A * 

In this case, the distribution of the actual argument is transferred into the procedure 
and inherited by A. 

3. by inheritance matching, syntactically expressed by: 

DISTRIBUTE A * d [TO r] 

A specification of this form indicates that the distribution of the actual argument is 
transferred into the procedure and inherited by A. However, if this distribution does 
not match the above specification, then the program is not HPF-conforming. 

If this distribution attribute of the dummy is known within the calling routine (through 
the use of interface blocks, for example), then the language processor will arrange for 
remapping the actual argument to the specified distribution (and mapping it back on 
return from the subprogram, if necessary). If the distribution attribute of the dummy 
is not made available when the caller is compiled, the onus is on the programmer to 
arrange for proper distribution of the actual argument. 
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4. implicitly: No explicit distribution is specified (directly or indirectly). In this case, 
the compiler provides an implicit distribution specification. 

8 The Template Directive in High Performance For- 
tran 

In the above sections, we have presented a model for mapping of data to processor memories 
without using templates. We claim that the HPF template directives are limited in their 
applicability and give rise to serious problems in the specification of the language, without 
adding any significant functionality. 

Template directives, which may occur only in the specification part of a (sub)program, 
result in the creation of a template. Although the language definition states that “templates 
are just abstract index spaces”, it postulates in other places that distinct definitions of 
templates in the same or different scopes are to be considered as different, independent 
of their associated index domain. As a consequence, each template created in a program 
execution must be interpreted as a tagged index domain. 

The discussion in the rest of this section does not include the so-called natural templates 
of HPF: they represent the index domain associated with an array and are thus implicitly 
part of our proposal. In fact, our claim could be rephrased as saying that natural templates 
are sufficient to describe all features related to distribution and alignment. 

8.1 The Usefulness of Templates 

Templates have been perceived to have two separate uses within the language. We discuss 
each of these briefly. 

8.1.1 Alignment of Staggered Grids 

The first use of templates is to enable the specification of alignment between arrays where 
there is no appropriate common index domain: this can occur whenever two or more arrays 
are each associated with different parts of a physical grid which do not completely overlap. 

Before we discuss the general case, we consider the example posted on the HPFF Distri- 
bution mailing list by C. A. Thole: 
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REAL U(0 :N, 1 :N) , V(1:N,0:N), P(1:N,1:N) 


!HPF$ TEMPLATE T(0:2*N,0:2*N) 

!HPF$ ALIGN P(I,J) WITH T(2*I-1 ,2*J-1) 

!HPF$ ALIGN U(I,J) WITH T(2*I,2*J-1) 

!HPF$ ALIGN V(I,J) WITH T(2*I-1,2*J) 

P=U(0 :N-1 , :)+U(l:N, :)+V(: ,0:N-1)+V(: ,1 :N) 

For the above code, the claim was made that: 

1. Only a template with a larger index domain than any of the arrays involved represents 
the nature of the physical grid structure correctly. 

2. Therefore the template T is required to specify the relationship between the data objects 
precisely: in particular, it is supposed to express the fact that P(I,J) is a neighbor of 
U(I,J) and U(T1,J), but not of U(I+1,J), and similarly for P and V. 

3. The actual distribution of the template (which is deliberately omitted) is irrelevant and 
will be chosen in a machine-dependent manner. 

Now, note that whenever two data objects in HPF are aligned with the same element 
of a template, then the language guarantees that these objects will be mapped to the same 
physical processor. But in the above example, all arrays are aligned with disjoint elements 
of the template. As a consequence, only the distribution of the template decides the actual, 
physical neighborhood relation. For example, the distribution 

!HPF$ DISTRIBUTE (CYCLIC .CYCLIC) : :T 

results in the worst possible effect, viz. different processor allocations for any two neighbors. 

While an alignment relation between arrays in a program’s data space is a relatively 
natural concept, the template-based code above does not establish one. Hence, this example 
is misleading at best, and would seem to point out a danger associated with the template 
concept rather than a use for it. 

However, the user will certainly desire to specify a collocation of the arrays in the above 
code or similar codes, which can be accomplished by declaring a template of size (N-(- 1 ,N+ 1 ). 
It is indeed not possible to correctly specify an HPF alignment (without a template) in this 
situation. Our extension of the HPF alignment directive (which allows restricted usage of 
MAX and MIN), will suffice to permit explicit alignment directives for many cases which 
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occur in practice, including this one. Otherwise, the distributions must be specified explic. 
itly. Given a suitable definition of the block distribution, one way to perform the required 
distributions is the following:^ 

REAL U(0:N,1:N), V(1:N,0:N), P(1:N,1:N) 

!HPF$ DISTRIBUTE (BLOCK .BLOCK) : : U,V,P 


P=U(0:N-1, :)+U(l:N, :)+V(: ,0:N-1)+V(: ,1:N) 

The language proposal contained in this paper offers a much more general solution, by 
providing a generalized form of block distribution. 

8.1.2 Passing Array Sections to Subroutines 

The second perceived use for a template directive was to permit the explicit declaration of 
mappings of array sections in subroutines: 

REAL A(1000) 

!HPF$ DISTRIBUTE A (CYCLIC (3)) 

CALL SUB (A (2: 996: 2)) 

SUBROUTINE SUB(X) 

REAL X( : ) !X inherits its distribution 

We assume that the dummy argument in subroutine SUB inherits its distribution from 
the actual argument. 

The question raised here is: 

how can the mapping of X be declared in SUB if one wants to specify it explicitly? 

Now one will, in general, not want to explicitly specify such a distribution: the relatively 
high cost associated with data movement on the current generation of parallel computers 
means that a subroutine will usually be written so that it is invoked with distributed ar- 
guments and the dummy arguments will indeed inherit the distribution from the actual 
argument as above. However, just as we write one subroutine to handle arrays of differ- 
ent sizes, so one expects such a subroutine to accept arrays with different distributions. In 

tHere the Vienna Fortran definition of BLOCK is assumed. With the HPF definition, this will cause a 
problem if and only if the number of processors divides N exactly. 
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those cases where a subroutine is important enough to warrant a specific redistribution of 
its arguments, or if this should be necessary for some reason, then the language provides the 
constructs required to prescribe the mappings. 

Templates were seen as a solution to the problem of providing distributions such as that 
of X above explicitly, should it be deemed necessary: 

SUBROUTINE SUB(X) 

!HPF$ TEMPLATE T(1000) 

!HPF$ ALIGN X(I) WITH T(2*I) 

!HPF$ DISTRIBUTE T (CYCLIC (3)) 

The template does help to specify this distribution in the example, but at the above- 
mentioned cost of a loss of generality for the entire subroutine. Note, further, that the same 
effect can be achieved by passing the entire array A to the subroutine and either using the 
array section explicitly or, if it is passed as a separate argument, repeating the alignment of 
the argument as above: 

SUBROUTINE SUB(A.X) 

!HPF$ REAL A (1000) 

!HPF$ ALIGN X(I) WITH A(2*I) 

!HPF$ DISTRIBUTE A * (CYCLIC (3)) 

(The asterisk indicates that the distribution of A is inherited). But recall that if there is 
another call site for this subroutine with a different actual argument for X, then neither of 
these solutions will be of any use. Instead, inquiry functions must be used to determine the 
properties of alignments and/or distributions passed into the subroutine. 

The current definition of HPF further attempts to facilitate the manipulation of the dis- 
tributions of sections of arrays passed to subroutines by introducing the INHERIT directive, 
which further removes the need for explicit use of templates in this situation (albeit at the 
cost of introducing a host of new syntactic and semantic difficulties). 

The main reason for this problem is that the current HPF language specification has an 
unfortunate shortcoming: HPF cannot (in contrast to, for example, Kali or Vienna Fortran, 
which include the concept of user-defined distribution functions ), describe explicitly every 
distribution that it can actually generate. 

8.2 Language Problems with Templates 

We now reiterate the two major problems caused by templates in the HPF language defi- 
nition. Note that templates are not first-class objects of the language: in particular, tem- 
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plates cannot be defined as being ALLOCATABLE. Furthermore, they cannot be passed 
as arguments to subroutines. 

1. Templates cannot handle allocatable arrays: 

While the shape of templates is determined at entry to a program unit and cannot be 
changed afterwards, an allocatable array may be subject to multiple ALLOCATE and 
DEALLOCATE statements, where the extents of the dimensions associated with each 
instance may depend on run-time and input values. There is no way in which HPF 
can establish a direct relationship between the shape of an instance of an allocatable 
array, and the shape of an associated template. 

Methods to avoid this dilemma would include the definition of allocatable templates, 
or of infinite templates (neither of which are a serious alternative). 

2. Templates cannot be passed across procedure boundaries: 

A data object whose distribution is described by a template may be passed to a sub- 
program in such a way that the dummy inherits the distribution. If we need to describe 
the distribution of the dummy argument, then we must be able to refer to the template 
of the actual (see above example). In HPF this would require the passing of templates 
to the subprogram as well. The INHERIT option for dummy arguments in the cur- 
rent HPF definition tries to achieve exactly that, introducing an element of maximum 
surprise for the user. The above example could be written as follows: 

REAL A (1000) 

!HPF$ DISTRIBUTE A (CYCLIC(3)) 

CALL SUB (A(2 : 996 : 2) ) 

SUBROUTINE SUB(X) 

REAL X( : ) 

! HPF$ INHERIT : : X 
! HPF$ DISTRIBUTE X * (CYCLIC (3)) 

The idea here is that the distribution specified for X is not the distribution of the 
dummy argument, i.e., the distribution of the array section A(2:996:2), but that of the 
array associated with the actual argument. 
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In contrast, the distributions defined in the language proposal of this paper (as well as 
in Vienna Fortran) are considered to be an attribute of an array, and they are handled 
that way as well. Even in the case of inherited distributions which cannot be explicitly 
specified, inquiry functions can be used to determine every aspect of the distribution 
passed into the procedure. 

9 Related Work 

Many of the concepts and constructs used in the above language proposal, and in the HPF 
specification, are not new. Processor arrays and the distribution of data to them were 
first used for distributed memory machines in the Kali programming language [9]. They 
were further refined in the Vienna Fortran language, where processor arrays could also be 
reshaped, now expressed by means of the HPF VIEW attribute. A major difference in the 
handling of processor arrays is, however, that Vienna Fortran supports the mapping of data 
to subsets of processor arrays and provides a canonical mapping of processor arrays to a 
linear processor array, to facilitate the portability of code. 

The Vienna Fortran language [1, 3, 12] is based both upon Kali and upon experience 
gained with the SUPERB parallelization system ([7, 11, 13]); it provides the user with a 
wide range of facilities for mapping data structures to processors, including those proposed 
in this paper and user-defined distributions. Vienna Fortran was the first language in which 
the issues of distribution handling at subroutine boundaries were investigated in depth. It 
introduced the concept of inheriting and of enforcing distributions and provided an attribute 
to enable the user to make assertions about the distributions of actual arguments. This 
language was also the first to make the distinction between static and dynamic distributions. 

Among other things, the mapping of data to subsets of processors and the inheritance 
of distributions have been implemented within the framework of the Vienna Fortran Com- 
pilation System. Two variants of the general block distribution used in this paper, but not 
included in HPF, have also been implemented. 

The programming language Fortran D [6] proposes a Fortran language extension in which 
the programmer specifies the distribution of data by aligning each array to a decomposition, 
which corresponds to a template, and then specifying a distribution of the decomposition 
to a virtual machine. These are executable statements, and array distributions are dynamic 
only. 

The Yale Extensions [4] specify the distribution of arrays in three stages: alignment, 
partition and a physical map. Because all these stages are modeled as bijective functions 
between index domains, data replication is not possible. By restricting the scope of layout 
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directives to phases, a block structure is imposed on Fortran 90. 

Cray Research Inc. has announced a set of language extensions to Cray Fortran (cf77) [10] 
which enable the user to specify the distribution of data and work. They provide intrinsics for 
data distribution and permit redistribution at subroutine boundaries. Further, they permit 
the user to structure the executing processors by giving them a shape and weighting the 
dimensions. Several methods for distributing iterations of loops are provided. 

10 Conclusions 

An approach which substantially reduces the cost of developing codes for distributed memory 
parallel machines is to provide a set of extensions for sequential languages (in particular, 
Fortran and C). These extensions should be portable across a wide range of architectures 
and should suffice for a wide variety of algorithms. The methods by which the user may 
distribute data to the processors are the central feature of such a language, and should be as 
natural and as flexible as possible. In this paper, we have presented in detail such a model 
for distribution and alignment of data. This model is both simpler and more general than 
the current High Performance Fortran model. In particular, it does not require a template 
directive and has simplified the passing of distributed arguments to subroutines. On the 
other hand, the concept of distribution functions has been generalized. A full description of 
the model described in this paper can be found in [2]. 
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